NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Exploring the Link between Cognitive Abilities and Data Science Skills using Alternative Raven’s Progressive Matrices

Farzan, Farshid; Mashrique, Hasan; Olney, Andrew M (October 2025, CEUR)

Free, publicly-accessible full text available October 1, 2026
Learning by Correcting AI Errors: Effort is Essential

https://doi.org/10.1007/978-3-031-98459-4_11

Olney, Andrew M; Cade, Whitney L (July 2025, Springer Nature Switzerland)

Free, publicly-accessible full text available July 20, 2026
Dataset Personalization Methods based on LLMs for Data Science Education: A Comparative Study of Rescaling and Sampling Approaches

https://doi.org/10.1145/3698205.3733936

Barboza, Luiz; Farzan, Farshid; Olney, Andrew M (July 2025, ACM)

Free, publicly-accessible full text available July 17, 2026
Efficacy of a Computer Tutor that Models Expert Human Tutors

https://doi.org/10.1007/978-3-031-98462-4_38

Olney, Andrew M; D’Mello, Sidney K; Person, Natalie; Cade, Whitney; Hays, Patrick; Dempsey, Claire W; Lehman, Blair; Williams, Betsy; Graesser, Art (July 2025, Springer Nature Switzerland)

Free, publicly-accessible full text available July 20, 2026
Visual Data Science with Blockly-DS

https://doi.org/10.1145/3626253.3633408

Barboza, Luiz; Ferreira_Mello, Rafael; Souza_Teixeira, Erico; Olney, Andrew M (March 2024, ACM)

Full Text Available
Generating multiple choice questions from a textbook: LLMs match human performance on most metrics

Olney, Andrew M (September 2023, CEUR workshop proceedings)
Moore, S; Stamper, J; Cao, T; Liu, Z; Hu, X; Lu, Y; Liang, J; Khosravi, H; Denny, P; Singh, A (Ed.)
Multiple choice questions are traditionally expensive to produce. Recent advances in large language models (LLMs) have led to fine-tuned LLMs that generate questions competitive with human-authored questions. However, the relative capabilities of ChatGPT-family models have not yet been established for this task. We present a carefully-controlled human evaluation of three conditions: a fine-tuned, augmented version of Macaw, instruction-tuned Bing Chat with zero-shot prompting, and humanauthored questions from a college science textbook. Our results indicate that on six of seven measures tested, both LLM’s performance was not significantly different from human performance. Analysis of LLM errors further suggests that Macaw and Bing Chat have different failure modes for this task: Macaw tends to repeat answer options whereas Bing Chat tends to not include the specified answer in the answer options. For Macaw, removing error items from analysis results in performance on par with humans for all metrics; for Bing Chat, removing error items improves performance but does not reach human-level performance.
more » « less
Full Text Available
Raising the Roof: Situating Verbs in Symbolic and Embodied Language Processing

https://doi.org/10.1111/cogs.13442

Hollander, John; Olney, Andrew (April 2024, Cognitive Science)

Abstract Recent investigations on how people derive meaning from language have focused on task‐dependent shifts between two cognitive systems. The symbolic (amodal) system represents meaning as the statistical relationships between words. The embodied (modal) system represents meaning through neurocognitive simulation of perceptual or sensorimotor systems associated with a word's referent. A primary finding of literature in this field is that the embodied system is only dominant when a task necessitates it, but in certain paradigms, this has only been demonstrated using nouns and adjectives. The purpose of this paper is to study whether similar effects hold with verbs. Experiment 1 evaluated a novel task in which participants rated a selection of verbs on their implied vertical movement. Ratings correlated well with distributional semantic models, establishing convergent validity, though some variance was unexplained by language statistics alone. Experiment 2 replicated previous noun‐based location‐cue congruency experimental paradigms with verbs and showed that the ratings obtained in Experiment 1 predicted reaction times more strongly than language statistics. Experiment 3 modified the location‐cue paradigm by adding movement to create an animated, temporally decoupled, movement‐verb judgment task designed to examine the relative influence of symbolic and embodied processing for verbs. Results were generally consistent with linguistic shortcut hypotheses of symbolic‐embodied integrated language processing; location‐cue congruence elicited processing facilitation in some conditions, and perceptual information accounted for reaction times and accuracy better than language statistics alone. These studies demonstrate novel ways in which embodied and linguistic information can be examined while using verbs as stimuli.
more » « less
Generating Multiple Choice Questions with a Multi-Angle Question Answering Model

https://doi.org/10.5281/zenodo.7761561

Olney, Andrew M. (January 2023, Zenodo)
Fancsali, Stephen E.; Rus, Vasile (Ed.)
Multi-angle question answering models have recently been proposed that promise to perform related tasks like question generation. However, performance on related tasks has not been thoroughly studied. We investigate a leading model called Macaw on the task of multiple choice question generation and evaluate its performance on three angles that systematically reduce the complexity of the task. Our results indicate that despite the promise of generalization, Macaw performs poorly on untrained angles. Even on a trained angle, Macaw fails to generate four distinct multiple-choice options on 17% of inputs. We propose augmenting multiple- choice options by paraphrasing angle input and show this increases overall success to 97.5%. A human evaluation comparing the augmented multiple-choice questions with textbook questions on the same topic reveals that Macaw questions broadly score highly but below human questions.
more » « less
Full Text Available
Scaffolding Computational Thinking Through Block Coding: A Learner Experience Design Study

https://doi.org/10.1007/s10758-022-09636-4

Tawfik, Andrew A.; Payne, Linda; Olney, Andrew M. (December 2022, Technology, Knowledge and Learning)

Full Text Available
Assessing Readability by Filling Cloze Items with Transformers

https://doi.org/10.1007/978-3-031-11644-5_25

Olney, Andrew M. (July 2022, Proceedings of the 22nd International Conference on Artificial Intelligence in Education)

Full Text Available

« Prev Next »

Search for: All records